Effective Semisupervised Learning on Manifolds

نویسندگان

  • Amir Globerson
  • Roi Livni
  • Shai Shalev-Shwartz
چکیده

The abundance of unlabeled data makes semi-supervised learning (SSL) an attractive approach for improving the accuracy of learning systems. However, we are still far from a complete theoretical understanding of the benefits of this learning scenario in terms of sample complexity. In particular, for many natural learning settings it can in fact be shown that SSL does not improve sample complexity. Thus far, the only case where SSL provably helps, without compatibility assumptions, is a recent combinatorial construction of Darnstädt et al. (2013). Deriving similar theoretical guarantees for more commonly used approaches to SSL remains a challenge. Here, we provide the first analysis of manifold based SSL, where there is a provable gap between supervised learning and SSL, and this gap can be arbitrarily large. Proving the required lower bound is a technical challenge, involving tools from geometric measure theory. The algorithm we analyze is similar to subspace clustering, and thus our results demonstrate that this method can be used to improve sample complexity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semisupervised alignment of manifolds

In this paper, we study a family of semisupervised learning algorithms for “aligning” different data sets that are characterized by the same underlying manifold. The optimizations of these algorithms are based on graphs that provide a discretized approximation to the manifold. Partial alignments of the data sets—obtained from prior knowledge of their manifold structure or from pairwise correspo...

متن کامل

Humans Learn Using Manifolds, Reluctantly

When the distribution of unlabeled data in feature space lies along a manifold, the information it provides may be used by a learner to assist classification in a semi-supervised setting. While manifold learning is well-known in machine learning, the use of manifolds in human learning is largely unstudied. We perform a set of experiments which test a human’s ability to use a manifold in a semis...

متن کامل

Alignments of Manifold Sections of Different Dimensions in Manifold Learning

We consider an alignment algorithm for reconstructing global coordinates from local coordinates constructed for sections of manifolds. We show that, under certain conditions, the alignment algorithm can successfully recover global coordinates even when local neighborhoods have different dimensions. Our results generalize an earlier analysis to allow alignment of sections of different dimensions...

متن کامل

Learning Better Data Representation Using Inference-Driven Metric Learning

We initiate a study comparing effectiveness of the transformed spaces learned by recently proposed supervised, and semisupervised metric learning algorithms to those generated by previously proposed unsupervised dimensionality reduction methods (e.g., PCA). Through a variety of experiments on different realworld datasets, we find IDML-IT, a semisupervised metric learning algorithm to be the mos...

متن کامل

A High Accuracy Method for Semi-Supervised Information Extraction

Customization to specific domains of discourse and/or user requirements is one of the greatest challenges for today’s Information Extraction (IE) systems. While demonstrably effective, both rule-based and supervised machine learning approaches to IE customization pose too high a burden on the user. Semisupervised learning approaches may in principle offer a more resource effective solution but ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017